feat(eval): configurable trials, parallelism, and CI threshold support by aroff · Pull Request #122 · gofastskill/fastskill

aroff · 2026-04-20T17:43:45Z

Summary

Adds trials_per_case, parallel, and pass_threshold fields to EvalConfigToml / EvalConfig with backward-compatible defaults (trials=1, threshold=1.0)
Introduces TrialResult and CaseTrialsResult types; trial artifacts written to {run_dir}/{case_id}/trial-N/ with aggregated.json per case
Extends EvalRunner trait with run_case_trials() backed by JoinSet + Semaphore bounded concurrency
Adds --trials, --ci, and --threshold CLI flags to eval run; --ci gates exit code on suite pass rate vs threshold
Emits cost warning when trials × cases >= 100

Test plan

All 150 unit tests and 48 eval integration tests pass (cargo nextest run -E 'test(eval)')
test_eval_run_trials_threshold_and_ci_exit_semantics — verifies 3/5 pass at threshold=0.6 succeeds; 3/5 at threshold=1.0 fails
test_eval_run_parallelism_reduces_wall_time — 4 trials × 0.5s sleep complete in <1.6s with parallel=4
Snapshot eval_run_help updated to include --trials, --ci, --threshold
Existing single-trial projects continue working unchanged (backward compatible defaults)

…pport Extends the eval system to run multiple trials per case with bounded concurrency and deterministic pass-rate aggregation. Adds --trials, --ci, and --threshold CLI flags; trial artifacts are written under {run_dir}/{case_id}/trial-N/ with aggregated.json summaries. Existing single-trial configs continue working without change.

aroff merged commit 88d1c2b into main Apr 20, 2026
11 checks passed

aroff deleted the feature/049-eval-trials-parallelism branch April 20, 2026 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): configurable trials, parallelism, and CI threshold support#122

feat(eval): configurable trials, parallelism, and CI threshold support#122
aroff merged 1 commit intomainfrom
feature/049-eval-trials-parallelism

aroff commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aroff commented Apr 20, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant